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Abstract — THIS PAPER IS ELIGIBLE FOR THE STUDENT 
PAPER AWARD. We consider distributed computation of func- 
tions of distributed data in random planar networks with noisy 
wireless links. We present a new algorithm for computation of 
the maximum value which is order optimal in the number of 
transmissions and computation time. We also adapt the histogram 
computation algorithm of Ying et al [1] to make the histogram 
computation time optimal. 

I. Introduction 

We consider distributed or 'in-network' computation of 
functions of sensing data in sensor networks in two dimen- 
sions. The sensor nodes collect sensing data and communicate 
with other nodes in a limited range over noisy wireless links. 
Our interest is in efficient evaluation of specific functions of 
sensing data. Latency, energy cost and throughput are the 
efficiency measures. We assume that communication costs 
dominate the time and energy cost for the computations. 

Two types of wireless networks are considered in the 
literature studying function computation — random planar net- 
works [l]-[3] and collocated or broadcast networks [2], [4]- 
[6]. Results for collocated networks are useful in solving 
corresponding problems for random planar networks. While 
[2], [3] consider noise-free links, [1], [4]-[6] consider noisy 
links. Computation of the histogram, and hence that of any 
symmetric function, in a random planar network with noisy 
links is considered in [1]. A protocol requiring (n log log n) 
transmissions to compute the histogram under the strong noise 
model is described. The lower bound is open. [1] is the only 
study of distributed computation in a noisy random planar 
network and it does not obtain the computation time (which 
is not the same as the energy for a random planar network). In 
this paper, we show how to effectively use the coding theorem 
of [7] to reduce the computation time in noisy random planar 
networks for maximum (MAX) and histogram functions. Also, 
our protocol for MAX requires 9 (n) transmissions. 

II. Model and Problem Description 

Sensor nodes 1, 2, . . . , n are uniformly distributed in [0, l] 2 . 
The transmission range of all the nodes is r n . We use the 
protocol model of interference [8]. Let pij be the distance be- 
tween nodes i and j. A transmission of node i can be received 
at node j if pij < r n , and for every other simultaneously 
transmitting node k, p^.j > (1 + A) r n , for some constant A. 



If pij < r n and there exists a node k that is transmitting at 
the same time as node i such that pk.j < (1 + A) r n , then 
there is a collision at node j. 

Time is slotted and all nodes are synchronized to these slots. 
The slot duration is equal to the bit duration. At time t, sensor 
node i has data xi (t) with xi (t) taking values from a finite set. 
For simplicity, we assume Xi(t) is {0, l}-valued. (A general 
finite set only changes the results by a constant factor.) Define 
x(t) := [xi(t), ara(t), ■ ■ . , x n (t)] . Our interest is in efficiently 
computing f(t) := cj)(x(t)) at some instants of time. f(t) is 
to be made available at a single sink node s, which is a node 
close to the center of the unit square. In general, f(t) may be 
required at a subset of the nodes. 

There are many ways in which the computation of f(t) 
for different t can be scheduled. Three such computation 
models are usually discussed in the literature. (1) One shot 
computation is a one-time computation of /(•) and, without 
loss of generality, we consider only /(0). (2) In pipelined 
computation f(t) is to be computed for {tk}k=i,2,... with the 
computations for the different tk being pipelined at each of 
the nodes. (3) In block computation x(t) is collected for a 
block of, say, K instants, i.e., for {tk}k=i,...,K and f(t) is 
computed for this block of time. 

The wireless links in the network are assumed to be binary 
symmetric channels with errors independent across receivers. 
We assume that the error probability is upper bounded by some 
€q < 0.5. Thus, our protocols work for the strong clairvoyant 
adversary model [9]. Note that the protocols in [1], [4], [5] 
also work for this noise model. A weaker noise model where 
each bit is in error with probability exactly eo has also been 
considered in the literature, e.g., [10]. 

Different distributed scheduling strategies are possible. In 
a collision free strategy (CFS), transmissions are scheduled 
such that in the absence of noise, there is no collision at the 
intended receivers. Also, information cannot be communicated 
by a node by avoiding a transmission in its scheduled time slot. 
This means that even though our model for the noisy network 
does not allow bit erasures, the protocols designed are robust 
in their presence. Erasures, if they occur, can be immediately 
identified as errors for such protocols. A CFS may be oblivious 
or non-oblivious. In an oblivious protocol, the transmission 
schedule is fixed beforehand. In a non-oblivious protocol, 
the data values transmitted in previous slots determine the 



evolution of the schedule. A non-oblivious CFS designed for a 
noiseless network can result in collisions when used in a noisy 
network. An oblivious protocol ensures that such collisions do 
not occur. We look for oblivious protocols. 

Let E t be the energy consumed for transmission of one bit, 
E r be the energy consumption at the receiver of a transmis- 
sion. We ignore processing energy. In general, E t depends on 
r n , and hence on n. However, results in [3] indicate that for 
sufficiently large n (> 5000), it is reasonable to assume that 
the transmission energy is constant and independent of n. We 
follow this assumption. Note that this is different from the 
energy model of [1]. If we let E r << E t , the number of 
transmissions required in the protocol can be used as a proxy 
for the energy consumed. Energy model E Mi will refer to the 
case when E r is not to be ignored and ElV^will refer to the 
case when it can be ignored. In EM2, we simply count the 
number of transmissions. Multiplying the energy in ElV^by 

&5 ( yJ^^J for some constant ks and a gives the energy 

in the model of [1]. 

As was mentioned earlier, we will consider the evaluation 
of the maximum (MAX) and histogram of x(t). Note that the 
MAX is the same as the OR function for binary data. 

A. Summary of Results and Organization of the Paper 

We describe a single oblivious protocol for computation of 
MAX in a noisy network that matches the lower bounds for 
both time and transmissions of the optimal CFS in a noise 
free network. This may be compared against the log log n 
penalty in number of transmissions that seems inevitable for 
the histogram function [1]. We also show how to achieve the 
trivial lower bound in time for histogram computation in a 
noisy random planar network, using a suitable modification of 
the algorithm presented in [1]. The number of transmissions 
is the same as that in [1], i.e., (n log log n). 

III. MAX in a Noisy Random Planar Network 

Like in [3], we wish to evaluate / = MAX/OR. Following 
[1]— [3], we propose a two-stage algorithm which is imple- 
mented by dividing the operational area into appropriately 
sized cells, performing intra-cell computation, or fusion, and 
then propagating the fused results towards the sink with 
appropriate combination on the way. 

Following [11], we perform a tessellation of the unit square 



into square cells of side I 



a total of M n = 



2.75 logri 



2.75 logri 



This gives us 



Label the cells as Sj,j = 



1,2,..., M n , from left to right and from top to bottom as 
shown in Fig. Q] Let Nj denote the number of sensor nodes in 

cell Sj. Of course ^2j=i Nj = n. Choose r„ = 



13.75 log n 



\fbl. Since I < -*4j= the nodes in each of the cells form a 
single hop network. Lemma 3.1 of [1 1] provides us with useful 
bounds on the number of nodes in each cell. 
Lemma 1: For any p > p,* = 0.9669, 



lim Pr ( maxliVj — 2.75 log n\ < 2.75/ilogn 

n — >oo \ j 




Fig. 1. The tessellation of the unit square with labeling of cells. The spanning 
tree used in inter-cell communication is also shown. 



The above lemma means that for j = 1,2, ... , M n , with 
probability 1, as n — ► 00 

0.091 logn < Nj < 5.41 logn (2) 

We form a spanning tree of the cells rooted at cell S s with 
edges along the central vertical axis and along all horizontal 
rows of cells as shown in Fig. Q] Let ir(j) denote the index 
of the parent of cell j, i.e. cell S„m is the parent of Sj, 
j = 1, 2, . . . , s — 1, s + 1, . . . , M n . We arbitrarily select one 
node in each cell Sj to be the cell center Cj (except that the 
sink node is made the cell center of S s ). This is possible with 
high probability since (0 tells us that each cell is occupied. 
The cell centers of two adjacent cells are one-hop neighbors, 
since I < 

A. Stage 1: Intra-cell Computation 

At the end of the first stage, the maximum of the values 
in each cell Sj is to be made available to Cj. Since each 
cell behaves like a single-hop network, we can use a noisy 
broadcast network protocol for this purpose. 

It helps to review the protocol for a noiseless random planar 
network [3]. Each node transmits its bit to the cell center. 
From (0, it is sufficient to give each cell [5-41 log nj time 
slots. During the computation in each cell, at most k\ : = 



2[ 



(1+A)r„ 

1 



1 ) — 1, i.e., a constant number of adjacent 



1 



(1) 



cells will be disabled due to interference. Thus, with high 
probability, this stage will require at most (fci + 1) [5.41 lognj 
slots, i.e., (logn) time using (n) total transmissions. 

For a noisy network, [1] describes a protocol using 
which the aggregation in a cell can be performed in 
(logn log logn) slots (with scheduling of cells as above) 
using (n log log n) total transmissions with overall proba- 
bility of error o(j^^j. Using this intra-cell protocol, we 
can collect the histogram of the cell sensor readings at the 
cell-center. However, this is more information than we need. 
Instead, we adapt the algorithm of [5] to reduce the complexity 
of the intra-cell protocol to O (n) transmissions. We explain 
this adaptation next. 

Define the 'witness' for cell Sj, Wj, as follows. Wj = io 
if io is the minimum i for which Xi = 1 and % € Sj . If Xi = 
for all i € Sj, an arbitrary node can be designated as Wj. 



Each cell is a single-hop network and we run the 'witness 
discovery' protocol described in the proof of Theorem 2 of 
[5]. From Theorem 2 of [5], witness discovery in Sj can 
be achieved by an oblivious protocol with less than A^iVj 
bit-transmissions in Sj, for some constant fe. At the end 
of the execution of this protocol, Cj identifies some Wj 

as the witness for Sj such that Pr yWj ^= Wj\ < e\ for 
any desired constant e\ > 0. We can therefore achieve 
Pr(x W] ^max^gs^i,}) < ei, for any a > i.e., Wj 
has the correct value of the maximum of the values in cell 
j with error probability that can be upper bounded by any 
constant. If e is the desired bound for the error probability in 
computing /, we choose e\ = § for each cell. 

From Observation 2.1 in [5], Cj can distribute the identity of 
Wj to all nodes in Sj using 8 (log n) transmissions to ensure 
that the identity of Wj is known to all nodes in the cell with 
error probability O (i). Finally, Wj transmits x\y., 8 (logn) 
times to Cj to ensure that the probability of Cj having the 
wrong value of xw t is O ( —). Let fj denote the value of 
xw decoded by Cj from these transmissions. Note the use 
of 8 (log n) transmissions and not 8 (log Nj ) transmissions 
in the last two steps to ensure error probability is O (i) for 
each of the cells. 

Thus, 8 (logn) + kiNj transmissions suffice to complete 
Stage 1 in cell Sj. Using © we can find constants k$ and 
hi such that Stage 1 requires ctjNj = 8 (logn) + feiVj 
transmissions in Sj with k% < a,j < k± V j = 1,2,..., M n . 
This, along with the bound on the number of interfering 
neighbors for a cell, means that Stage 1 can be completed in 
the entire network in 8 (log n) time using 8 (n) transmissions. 

While the probability of error in correctly identifying Wj in 
each cell j is upper bounded by a constant, we do not yet know 
a 'network-wide' error probability. But then how do we define 
the error in the network? An obvious definition of a 'network- 
error' is the event of there being an error in identifying the 
witness in one or more cells. A little consideration shows 
that a more appropriate definition of the 'error' would be as 
follows. Let / := maxi<i< n {;Ei} be the true value of the 
MAX function, and let / := maxi<j<^/ n {fj}. We will say 
that 71, a Stage 1 error, has occurred, if / ^ /. We show that 
the probability of 71 is bounded by a constant for large n. 

An error in stage 1 (as defined above) occurs in two ways, 

1) / = 1 and fj = V j. In this case, there must be a 
cell Sj that has at least one 1, Now the probability of 
missing all Xi = 1 for i g 5- in identifying the witness 
for cell Sj is bounded by |. The probability of finding a 
witness Wj with xw-. = 1 but having /j = is O (ij. 
Thus the probability of having /- = is bounded by 
§ + O (— ). This gives us a bound of | + O (-M on the 
probability of the event 71. 

2) / = and fj = 1 for one or more j. Since / = 0, 
we have xw 3 — V j. The probability of obtaining the 
wrong value of fj is O (£) for 1 < j < M n . Since 

there M n = 8 ( -j- 2 — ] cells, from the union bound, the 

11 \ log n J ' 



probability of 71 is O 




Thus the error probability for the Stage 1 is bounded by | + 
(]*«)■ 

Using the proof technique of Theorem 4 of [5], an oblivious 
protocol can be designed to achieve the same asymptotic time 
and transmissions complexity as the query based protocol. The 
initial phase in which each node transmits its data value a 
constant number of times and is heard by all other nodes in 
its cell requires 8 (n logn) energy under EMi. We already 
know that the energy is O(nlogn) under EMi, since each 
transmission can be heard by at most 8 (logn) nodes. Hence, 
the total energy consumption of the first stage is 6 (n log n) 
under EMi. Because of the phase in which the identity of 
Wj is distributed, this is also the energy consumption for the 
non-oblivious version described earlier. 

Lemma 2: Stage 1 can be completed by an oblivious pro- 
tocol in time 8 (log n) using 8 (n) transmissions (same as 
energy under EM2) with energy consumption 8 (nlogn) un- 
der EMi, such that the overall probability of error is bounded 

byf + o(^). 

B. Stage 2: Inter-cell Computation 

As in [l]-[3], in Stage 2, data is passed along the tree 
towards the sink, first horizontally and then vertically between 
the cell centers Cj (see Fig. [TJ. Let fj = maxy {fj> } where 
Sj> are the cells in the subtree rooted at cell Sj, excluding 
cell Sj. max{/j, fj} is passed to the cell center of the parent 
cell C n (j\, Data thus moves along the spanning tree towards 
the sink. 

In this stage, parallel communication of data can reduce 
the time complexity. Recall that when Cj transmits, at most 
k\ other adjacent cell centers will not be able to receive other 
transmissions in the same slot. Note that we need bidirectional 
flow of data along the edges of the tree to use [7]. Since the 
vertices of the tree have a maximum degree of d = 4, for 
an active link, at most ks = 4(fci + 1) links incident on the 
cells in the neighborhood will be disabled. Thus, all links can 
be activated at least once each in at most fcg slots. We can 
therefore consider each link in the spanning tree to be a 1-bit 
per 'logical slot' link with each logical slot corresponding to 
&5 real slots. Hereafter, we consider only logical time slots. 

It is easy to see that the coding theorem of [7] is applicable 
to the logical network defined above. 

Theorem 1: (Theorem 1.1 of [7]) Any protocol which 
runs in time T on an iV-processor network of degree d 
having noiseless communication channels, can, if the chan- 
nels are noisy (each a binary symmetric channel of capac- 
ity C, < C < 1), be simulated on that network in 

time O (Tjt log(d + 1) + ^ log AT) with probability of error 

e -n(r) i 

Remark 1: The protocol described in [7] to achieve this 
uses tree codes. [12] gives the best available construction of 
tree codes — a randomized construction that works correctly 
with probability arbitrarily close to 1. The computation re- 
quired for maximum likelihood decoding used in the proof of 



(-> ( log n ) columns 



± 



Fig. 2. Sub-stages in the Inter-cell protocol. Part of the unit square is shown. 
S denotes the cell containing the sink. 



Theorem [T] grows exponentially with the depth of the tree and 
an efficient method for decoding still needs to be found to 
make it computationally tractable. We ignore computational 
expense in this paper. 

The depth of the tree is 6 {j and there are 6 (i^) 

nodes in the tree. Hence Stage 2 would require ( , -^-^ 



time slots and O ( t- 2 — transmissions in a noiseless network. 

V log n J 

We can modify this protocol as follows to make it possible to 
use Theorem Q] to simulate it in a noisy network — for each 
time slot in which a link is not activated send a dummy 
bit on the link (say 0). Now, from Theorem Q] this protocol 
can be simulated in the noisy network with only constant 
factor increase in time because we have d = 4. However, 
a naive implementation of the protocol of [7] would need 

(^j^j^j J transmissions, compared to (j^^j trans- 

missi ons fo r the noiseless network. There is an increase of 
(^^J j^jij , i.e., 0(7-), because the protocol of [7] requires 
every Cj to transmit in every slot, while in the noiseless 
network, each would transmit just once. 

We can reduce the number of transmissions by securing the 
transmissions at each level of the tree by repetition coding, 
i.e., transmitting (logn) times on each link. This increases 
the number of transmissions to (n), an increase of only 
(log n) from the noiseless case. Since each node has to 
collect the data from the tree below it, we can use information 
theoretic arguments to show that even if we interleave trans- 
missions at different depths of the tree in time, we cannot 
do better than this in terms of number of transmissions. Note 
that the overall number of transmissions for the maximum 
computation is now just (n) i.e., it is optimal. However, 
there is a penalty of (log n) in time, which is inflated to 
(\fn logn). 

Ideally, we would like a way to complete the Stage 2 com- 
putation in O (n) transmissions (at most as many transmissions 



as Stage 1) and in \ J 1^77 J time i.e., we would like to not 
have a penalty either in time or in number of transmissions. 
Surprisingly, this is possible. 

This can be achieved by propagation of bits up the tree in 



a series of sub-stages, moving (log n) levels (ignoring edge 
effects) in (logn) time in each sub-stage. The first set of 
sub-stages bring data to the central axis, and the second set 

take it to the sink. There are 



(To^F ) sub " sta g es - See 
Fig. [2] 1 and 2 in the figure correspond to sub-stages. 



At each sub-stage we have O y^~J linear array networks 
with (logn) nodes each. From Theorem [T] we can ensure 
that the error probability is O (-) for each linear array. 

This gives a bound of O ^yj n \o g ^ j f° r the sub-stage error 

probability. We have an overall union bound of O ( (i a ^ n yi ) 
for the error probability of Stage 2. Note that a vanishing error 
probability would be impossible if each sub-stage involved 
o (log n) levels of the tree in an attempt to reduce the number 
of transmissions without increasing the time. Clearly, each 
of the f^J^] links is activated (log n) times in this 
scheme i.e., (n) transmissions are required in total. Each 
transmission is to be heard by a single receiver, hence the 
energy under both EMiand ElV^is (n). 

Lemma 3: Stage 2 can be completed by an oblivious pro- 
tocol in (^^Jj^-^j time using (n) transmissions (same 
as energy under EM2) with energy consumption (n) under 
EMi, such that the overall probability of error is bounded by 
O 



(logn) 2 , 

The total error for the computation is | 



O 



O 



logn J 



Jlogn) 2 J 2 

Combining lemmas u\ and [3] 
Theorem 2: In a random planar network with binary data, 
with binary symmetric channels and errors independent across 
receivers with error probability bounded by some eo < ^, with 
high probability the MAX (or OR) can be computed with 
probability of error less than any e > for large 77 using (77) 



total transmissions with transmission radius 



log n 



and 



in a time 



log n 



This matches the corresponding results for noiseless networks 
in [3]. The total energy for the protocol described is 0(n log n) 
under EMiand 0(n) under EM2. 

IV. Discussion 

A. Pipelining 

It is easy to see that a pipelined throughput of 

with a delay of (^^J-^^j in reaching the sink is also 
achievable in the noisy case with a scheme similar to the one 
given in [3] for the noiseless case. We alternately perform 
iterations of Stage 1 and Stage 2, spending (log 77) time 
on each. While doing Stage 2, we allow for one sub-stage of 
each of the ongoing computations to be completed. Note that 
different sub-stages of different computations can essentially 
be done in parallel. 



B. Distribution of computation result 

The result of the MAX computation can be distributed to all 
nodes with only a constant factor overhead in time and number 
of transmissions. It can be distributed to the cell centers along 
the same spanning tree in a series of sub-stages as in Stage 2 
(Section [Ill-Bb . Thereafter, each cell center can broadcast the 
value (log n) times so that all nodes have the correct value 
of the function with error probability O 

C. Non-binary data 

All results can easily be generalized to the case in which 
sensor data belongs to any finite set. Suppose Xi E X = 
{1, 2, . . . , \X\} V i = 1,2, . ..,n. We can complete the 
computation of MAX in log 2 \X\ stages. Each stage consists of 
two parts. Briefly, the first part is as described in Section [TTTl 
and reduces the number of possibilities for the result by a 
factor of 2. The second part involves distribution of the result 
of the first part as in Section HV-BI Thus the computation can 
be completed using (nlog \ X\) total transmissions in a time 

D. Block Computation 

The results of Giridhar and Kumar [2] carry over trivially 
to the case of noisy networks. This is because in block 
computation, Gallager's coding theorem can be used to secure 
each individual message transmission with a constant factor 
overhead, provided the message length is long enough. This 
can be ensured simply by taking sufficiently long blocks. MAX 
belongs to the class of type-threshold functions and can hence 
be computed at a rate of ( log 

V. Optimizing the time of histogram computation 

The protocol for histogram computation with binary data 
in a noisy random planar network in [1] takes (nlog log n) 
transmissions. Computation time is ignored. Clearly, the time 
required by any correct protocol for histogram computation 
must be Q ( , / ) (as is true even for MAX). We describe 



log n 



an algorithm that takes 



time slots. 



Firstly, we construct a spanning tree as in Fig. Q] This 



ensures that the depth of the tree is limited to y^J 
This is in contrast with an arbitrarily constructed spanning 
tree which can have a depth as large as (^j^ij ■ Even with 
the spanning tree of Fig.Q] an implementation of the Inter-cell 
protocol described in [1] requires (logn) time for each level 
of the tree, yielding a total time requirement of U/nlog n) . 
Number of transmissions needed is (n). We show how to 
bring the time required down to (^^Jj^^j while using the 
same number of transmissions. 

Once again we perform Stage 2 in a series of sub-stages as 
shown in Fig. [2] Each cell needs to convey the number of l's 
in its sub-tree of cells (including the cell itself) to its parent. 
This information can be represented by a binary string of size 
g = [logn] for each cell (by adding appropriate zeros to 
the left). Let this binary string for cell Sj be b J g _ 1 b J g _ 2 • • • b 3 0i 



where each b\ is a bit. Consider part of a sub-stage with the 
linear array of cells S ai , S a . 2 , . . . , S aq , where q — (log n) 
(ignoring edge effects). The cells are ordered with S ao being 
the cell at the greatest depth, and S a being the cell closest 
to the sink. The noiseless protocol to be simulated using 
Theorem 03 is constructed as follows. C ai transmits b^ 1 up 
to the cell center of its parent C a2 in the first time slot. Other 
links have dummy bits exchanged. Before the second slot, 
C a2 computes 6q 2 using the number of l's in cell S a2 and 
the received bit b^ 1 . In the second slot, C ai sends 6" 1 to 
C a2 , and C a2 sends 6q 2 to C Q3 . Dummy bits are sent over 
other links. Before the third slot, C a2 computes b\ 2 and C a3 
computes b^ 3 . Progressing this way, at the end of q + g — 1 = 
(log n) slots, C a has received all required data from C a . . 
Now, this protocol can be simulated in a noisy setting using 
TheoremQ] The sub-stage can be completed in (logn) time, 
using (log n) transmissions for each cell center (as for the 
MAX computation). Moreover, we can ensure that the overall 
probability of error goes to zero as for the MAX computation. 
Thus, the inter-cell protocol can be completed in (^^Jj^ij 
time overall, using only (n) transmissions. 

VI. Conclusion 

A non-trivial lower bound is yet to be proved for number of 
transmissions required for histogram computation. [6] demon- 
strates that (n) transmissions suffice in a noisy broadcast 
network (in a weak noise model). However, for a noisy 
random planar network, the best known algorithm requires 
(n log log n) transmissions, whereas the trivial lower bound 
is O (n). It would be interesting to close the gap. 

Also note that the computational complexity of the algo- 
rithms that we describe here can be significant, given that an 
efficient method for decoding of tree codes is not known. 
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